5  Paired t-test and ANOVA

In this chapter, we will be looking at paired t-tests and one-way ANOVA. Recall that:

6 Paired t-test

Photo by Dan Gold on Unsplash

The next dataset we will be using comes from the Living Well Multicultural – Lifestyle Modification Program. This file contains paired pre- and post-intervention measurements for adults who participated in the program between 2014 and 2017.

You can then save the file in your Datasets folder in your InferentialStats project folder that you created in the Introduction. You can give it a new name so that it is easier to call into R Studio.

If you open the file, you may notice that the first row is simply a title. We can ignore the title by including skip when we load in the data.

Once you’ve saved the file, you can load the dataset into R.

# Load in tidyverse package library(tidyverse) # Load in data set living_well <- read.csv("Data-sets/Cleaned_data_pre_post.csv", stringsAsFactors = TRUE)
# Load in tidyverse package
library(tidyverse)
# Load in data set 
living_well <- read.csv("Data-sets/Cleaned_data_pre_post.csv", 
                   stringsAsFactors = TRUE)

We can take a look at the data to ensure it has been loaded in correctly.

  1. What is the code for looking at the structure of the dataset?.

Type the code below.

str(living_well)
str(living_well)
  1. How many variables are there?
  2. How many observations are there?
unique(living_well$Time)
[1] Baseline   Post week8
Levels: Baseline Post week8

If we look at the Time variable, we can see that there are two levels: Baseline and Post week8. We also have an ID variable, which indicates that we have a repeated measurement for the baseline and post week 8 measurements. As we have one row for each measurement, this means our data is in long format. Another way to think about long format is that the same participant appears multiple times in the data set.

To be able to perform a paired t-test to compare the pre- and post-measurements, we need to convert the data to wide format. This means that we want one row per participant, with separate columns for each time point, such as a pre value and post value in the same row.

You may notice that there are many rows that have observations recorded as 999. At times, 999 is used as a placeholder for missing entries.

living_well_wide <- pivot_wider(
  living_well,
  id_cols = ID,               
  names_from = Time,          
  values_from = BMI           
)

You can see an error appears when we try to convert the data. This is telling us there are duplicates for the ID beyond the two that we are expecting. Let’s try using the code provided in the warning to see what’s going on.

living_well |>
  dplyr::summarise(n = dplyr::n(), .by = c(ID, Time)) |>
  dplyr::filter(n > 1L)
          ID       Time n
1  PIR201181   Baseline 2
2  PIR201181 Post week8 2
3 ABD 010187   Baseline 2
4 ABD 010187 Post week8 2

There are four IDs that have two baseline measurements and two post week8 measurements. Let’s investigate these rows closer.

living_well[living_well$ID == "PIR201181", c("ID",
                                             "Time", 
                                             "BMI", 
                                             "Weight") ]
           ID       Time  BMI Weight
647 PIR201181   Baseline 27.5   74.3
651 PIR201181   Baseline 21.0   50.4
655 PIR201181 Post week8 26.2   71.0
659 PIR201181 Post week8 21.0   49.7
living_well[living_well$ID == "ABD 010187", c("ID",
                                             "Time", 
                                             "BMI", 
                                             "Weight") ]
            ID       Time  BMI Weight
813 ABD 010187   Baseline 26.4   63.4
816 ABD 010187   Baseline 21.7   59.0
831 ABD 010187 Post week8 26.7   64.2
834 ABD 010187 Post week8 26.7   64.2

This confirms what we expected - that there are four measurements per ID, when we were expecting two. This was likely a data entry error, which can happen in real world datasets. It is impossible to tell which pre- and post-measurements match each other, so the safest option is to drop these observations. As we have 1112 observations, this shouldn’t impact our analysis too much.

bad_ID <- living_well |>
  dplyr::summarise(n = dplyr::n(), .by = c(ID, Time)) |>
  dplyr::filter(n > 1L) %>%
  distinct(ID)

living_well_clean <- living_well %>%
  filter(!ID %in% bad_ID$ID)

Let’s try convert our data to wide format again.

living_well_wide <- pivot_wider(
  living_well_clean,
  id_cols = ID,               
  names_from = Time,          
  values_from = BMI,
  # We include `names_glue()` as it converts our variable `Post week8` to `Post_week8`
  names_glue = "{str_replace_all(Time, ' ', '_')}"
)

No warning this time! That’s a good sign our method worked. Let’s take a look at our dataset now.

str(living_well_wide)
str(living_well_wide)
  1. How many variables are there?
  2. How many observations are there?

Remember, we have 552 observations now because we combined the pre- and post-measurements into one row. We started with 1112, removed 8, and then combined the two measurements which results in 552.

Now we can check if there are any 999 values and drop them.

any(living_well_wide$Baseline == 999, na.rm = TRUE)
[1] TRUE
any(living_well_wide$Post_week8== 999, na.rm = TRUE)
[1] TRUE

As we have TRUE for both variables, we can convert the 999 to NA and then remove.

living_well_wide <- living_well_wide %>%
  mutate(
    Baseline = na_if(Baseline, 999),
    Post_week8 = na_if(Post_week8, 999)
  ) %>%
  drop_na(Baseline, Post_week8)

any(living_well_wide$Baseline == 999, na.rm = TRUE)
[1] FALSE
any(living_well_wide$Post_week8== 999, na.rm = TRUE)
[1] FALSE

We have officially cleaned our data! We can now get into our statistical analysis.

Write Hypotheses

Because we are keeping the dataset simple, a natural research question is whether there is a significant difference

\(H_0:\) For participants in the Living Well Multicultural – Lifestyle Modification Program, the mean BMI after the program is not significantly different than the mean BMI before the program.

\(H_A:\) For participants in the Living Well Multicultural – Lifestyle Modification Program, the mean BMI after the program is significantly lower than the mean BMI before the program.

We can express this in mathematical notation:

\(H_0:\) \(\mu_{diff}=0\)

\(H_A:\) \(\mu_{diff} > 0\)

where \(diff = BMI_{Baseline} - BMI_{Post Week 8}\). Now we need to calculate the difference between the baseline BMI and Post Week 8 BMI . We can do that with:

living_well_wide$diff <- (living_well_wide$Baseline - living_well_wide$Post_week8)

living_well_wide$diff
  [1]  0.20  0.20  0.90  1.20  0.10  0.20  0.80  0.50  0.10  0.10  0.00  0.40
 [13]  1.50  0.30  0.30 -0.10  0.10  0.80  1.10  1.20  0.80  0.30 -0.06  0.26
 [25]  0.45  0.53  0.32  0.42 -0.01 -0.03  0.31  0.60  0.50  0.70  0.40  0.40
 [37]  1.50  0.40  0.40  0.30  0.40  0.40  0.30  0.70  0.00  1.60  1.00 -0.20
 [49] -0.60  0.40  0.20  0.10  0.40  1.00  0.50  1.20  0.50  0.30  0.00 -0.10
 [61]  0.70 -1.70 -4.90  0.40  0.10  0.20  0.30  0.00  0.70  0.90  0.10  0.90
 [73]  0.60  0.10  1.70  0.40  0.10  0.30  1.10  0.20  0.10  0.20  0.00  0.40
 [85]  0.10 -0.40  0.00  0.40  0.20  0.00  0.30 -0.10  0.00  0.80  1.00  0.80
 [97]  0.40 -0.20  1.50  0.30  0.30 -0.20  0.70  0.40 -0.30  1.40  0.70  0.60
[109]  0.50  0.30  0.30  1.80 -0.10  1.00  0.50  0.40 -0.20  0.90  0.50  0.50
[121]  1.00  0.50  0.80  0.30  1.30  0.60  1.10  0.70  0.70  0.60  0.90  0.80
[133]  0.80 -0.80  0.30  0.20  1.40  2.30 -0.70  0.70  1.10  0.40 -0.90  1.20
[145] -1.70  2.00  0.00  1.40  1.40  3.60  0.80  0.40  0.80  1.00  0.40  0.50
[157] -0.80  0.80  0.70  1.40  1.30  1.00  0.70 -0.20  0.70 -1.50 -2.10  2.00
[169] -1.10 -0.60  0.80  1.00  0.10 -0.40  2.10 -0.30  0.90 -0.30  0.70 -0.30
[181] -2.90 -0.50 -0.80  1.60  0.50  1.30  0.60  0.30  0.00  0.00  0.50  0.10
[193]  0.60  0.00  0.90  0.30  0.90  0.00  0.00 -0.40 -0.40 -1.40  0.30  0.10
[205] -1.60  0.00  0.20  0.50  0.20  0.20  0.00  0.00  0.50  0.00  0.20  0.40
[217] -0.30  0.30  0.00  0.10  0.40 -0.50  0.30  0.40 -0.90  0.00 -0.20 -0.50
[229]  0.80  0.30  0.60  0.10  0.10 -0.40  0.20  0.20  0.10  0.30  0.10  0.20
[241]  0.20  0.00  0.10  0.40  0.40  0.00  0.00  0.10  0.00  0.20  0.90  0.90
[253]  0.30  0.70  0.40  0.70  0.70  0.00  0.40  1.00  0.90  0.30  0.00  1.20
[265] -0.40  0.30  1.40  0.50 -0.30 -0.40 -0.10 -0.30  0.30 -0.40 -0.20  0.70
[277]  0.40  0.10  1.30  0.00  0.80 -0.70  0.40  0.30  0.00  0.40  0.50  0.00
[289] -2.00  0.40  0.40  0.40  0.60  0.60 -0.10  0.30  0.50  0.60 -0.30  0.30
[301] -0.30 -0.30  0.20  1.00  0.30  1.00  0.80 -0.20 -0.80  0.00 -1.00 -0.40
[313]  0.00  0.90 -0.50  0.20 -0.90 -0.30  0.60  0.10  0.40  0.10  0.30  0.50
[325]  1.20 -0.10  0.20  0.10  0.70  0.20  0.00  0.80  0.90 -0.90  0.50  0.20
[337]  0.30  0.50  0.80  1.40  0.60 -0.40  0.30  0.80  0.80 -0.20  0.10  0.20
[349] -0.10  0.80  0.50 -0.10  1.00  0.10  0.20  0.60  0.10 -0.80  0.30  0.20
[361]  0.60  0.70  0.80  0.70  0.20  0.90  0.60  0.60  0.40  1.10  1.60 -0.20
[373] -0.30  0.70  0.10 -0.60  0.30  0.30  0.30  0.30 -0.10  0.50  0.80 -0.30
[385]  0.70 -0.30  0.90 -0.90  0.10  1.50  1.60  0.90 -0.20 -0.30  0.70  1.20
[397]  1.90  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00 -0.10 -0.70  0.00
[409]  0.00  0.00 -0.10  0.70 -0.50  0.80  0.20  0.30  0.20  0.20  0.60  0.00
[421]  0.40  0.60  0.20  0.00  0.40  0.40  0.40 -0.50 -0.60 -0.50  1.10 -1.00
[433]  0.10  0.20  0.70  0.30 -1.10  0.60 -0.30 -0.80 -0.10  0.00 -0.30  0.30
[445]  0.70  0.10  0.20 -1.30 -0.30 -0.30 -0.10  1.20  0.60  0.30 -0.30  1.60
[457]  1.20  0.20  0.00  0.70  0.20  0.00  0.00  0.20 -0.10  0.40  0.30  0.60
[469]  0.30  1.20  1.40  0.70  0.20  0.10  0.00  0.20 -0.20 -0.50 -0.10  0.20
[481] -0.30 -0.30 -2.10  0.70 -0.90 -1.30  1.00  0.30  0.00  0.00 -0.70  0.90
[493] -0.90  0.20 -0.20 -0.40  0.40 -0.20 -0.40 -0.70  0.80  1.10 -0.40  1.80
[505]  0.30 -0.30  0.70  0.40  2.20  0.20  0.60  0.10  3.70  1.10  0.00  1.30
[517]  0.80  0.50  0.40  0.80  0.10  0.40 -0.10  0.10  0.70 -1.20  0.40  0.50
[529]  0.10  0.20  0.10  0.20  0.00  0.00  0.00  0.50  0.30  0.00  0.80 -0.20
[541]  0.00  0.80 -0.10  0.40  0.10 -3.80  0.90  0.20  0.00

We can attach the data set.

attach(living_well_wide)

We can check for skewness and normality.

ggplot(data=living_well_wide, aes(y=diff, x=""))+
  geom_boxplot(fill="mistyrose", col="pink4")+
  theme_bw()

ggplot(data=living_well_wide, aes(sample=diff))+
  geom_qq(col="navyblue")+
  geom_qq_line(col="steelblue")+
  labs(x="Theoretical quantiles", y="Sample quantiles")+
  theme_bw()

We can now run our t-test:

t.test(Baseline, Post_week8, paired = TRUE, alternative = "greater") 

    Paired t-test

data:  Baseline and Post_week8
t = 9.0896, df = 548, p-value < 2.2e-16
alternative hypothesis: true mean difference is greater than 0
95 percent confidence interval:
 0.2275603       Inf
sample estimates:
mean difference 
      0.2779417 

The p-value (\(t_{548}=9.09, p=2.2*10^{-16}\)) is much less than the 5% significance level (p<0.05). We used greater because our hypothesis was that BMI would go down after the program. The t test calculates the difference as baseline minus post week 8, so if BMI really decreased, that difference would be positive. Using greater than tells R to test whether the average of those differences is greater than zero, meaning baseline values were higher than post week 8 values.

Therefore, the mean BMI after the program is significantly lower than the mean BMI before the program.

Let’s generate a two-sided paired t-test to get the confidence interval.

t.test(Baseline, Post_week8, paired = TRUE) 

    Paired t-test

data:  Baseline and Post_week8
t = 9.0896, df = 548, p-value < 2.2e-16
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 0.2178774 0.3380060
sample estimates:
mean difference 
      0.2779417 

With 95% confidence, there ia mean BMI difference of 0.22 and 0.34 BMI Because this interval includes zero, the data supports a real difference before and after the program. Note that if 0 was IN in the confidence interval, this would suggest there is not a significant difference.

detach(living_well_wide)

7 ANOVA

For the second part of this lesson, we are going to explore ANOVA.

We want to do ANOVA when we are comparing three or more groups (or categories) against a continuous response variable.

We can continue on with our living_well dataset, but instead refocus our research question and restructure the dataset accordingly.

An example of a research question we might ask with ANOVA is whether education level impacts post program BMI.

Let’s rewrite our hypotheses.

\(H_0:\) There is no difference in mean BMI at post week 8 between education groups in the Living Well Multicultural – Lifestyle Modification Program.

\(H_A:\) At least one education group has a different mean BMI at post week 8.

We can express this in mathematical notation:

\(H_0:\) \(\mu_1=\mu_2=\mu_3=...=\mu_7\)

\(H_A:\) At least one \(\mu_i\) is different.

Let’s tidy up the data.

living_well_aov <- living_well_clean %>%
  pivot_wider(
    id_cols = c(ID, Education),
    names_from = Time,
    values_from = BMI,
    names_glue = "{str_replace_all(Time, ' ', '_')}"
  )

Let’s be sure to remove any 999 values. How can we do this?

living_well_aov <- living_well_aov %>%
  mutate(Education = factor(replace(Education, Education == "999", NA))) %>%
  mutate(
    Baseline = na_if(Baseline, 999),
    Post_week8 = na_if(Post_week8, 999)
  ) %>%
  drop_na(Baseline, Post_week8, Education)

Remember, because Education is a categorical variable, 999 will be stored as a string rather than a numerical variable.

str(living_well_aov)
tibble [521 × 4] (S3: tbl_df/tbl/data.frame)
 $ ID        : Factor w/ 554 levels "ABA200879","ABD 010160",..: 295 294 38 296 191 192 297 10 453 452 ...
 $ Education : Factor w/ 7 levels "Bachelor degree",..: 5 5 3 5 4 4 1 3 1 1 ...
 $ Baseline  : num [1:521] 19.9 18.8 34.6 20.8 30.3 24.5 21.5 26.2 19.6 19.9 ...
 $ Post_week8: num [1:521] 19.7 18.6 33.7 19.6 30.2 24.3 20.7 25.7 19.5 19.8 ...
attach(living_well_aov)

For a one-way ANOVA we need to check

  • Independence

  • Normality of the response variable for each population.

  • All populations have the same standard deviation.

To determine these:

  • We typically assume independence, as long as e.g., a random sample has been collected and each observation is a unique case/subject/participant.

  • We can determine normality by considering side-by-side boxplots, or a quantile-quantile plot.

  • We can compute the standard deviation for each group using aggregate() like we tried last week. A rule of thumb is that we are likely safe if the largest standard deviation is less than double the smallest standard deviation.

ggplot(living_well_aov, aes(x=Education, y=Post_week8))+
  geom_boxplot(fill = c("#f94144",
                        "#f3722c",
                        "#f8961e",
                        "#f9c74f",
                        "#90be6d",
                        "#43aa8b",
                        "#577590"))+
  theme_bw() +   
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))

ggplot(data=living_well_aov, aes(sample=Post_week8))+
  geom_qq()+
  geom_qq_line()+
  theme_bw()+
  facet_wrap(~Education)

aggregate(Post_week8~Education, FUN=sd)
                        Education Post_week8
1                 Bachelor degree   6.393231
2 Certificate (trade or business)   8.088750
3     Diploma or Associate degree   6.313799
4     High school (up to 4 years)   8.743657
5     High school (up to 6 years)   6.760607
6              Posgraduate degree   4.549479
7                  Primary school   7.602748
aggregate(Post_week8~Education, FUN=length)
                        Education Post_week8
1                 Bachelor degree         93
2 Certificate (trade or business)         54
3     Diploma or Associate degree         60
4     High school (up to 4 years)         69
5     High school (up to 6 years)         95
6              Posgraduate degree         48
7                  Primary school        102

Based on these outputs, do you think the necessary conditions for our one-way ANOVA are met?

Each individual is unique and only sampled randomly once, so independence is met.

The scatter in the q-q plot is reasonably straight and the boxplots are reasonably symmetrical about the median, so a normal distribution is met.

4.55/8.74
[1] 0.520595

\(4.55/8.74 = 0.52 < 2\)

The largest standard deviation is less than twice that of the smallest, so constant variance is met.

We can now run an ANOVA on the data. Note these are two lines of code and thus need to be run one after the other.

mod.aov <- lm(Post_week8~Education, data = living_well_aov)
anova(mod.aov)
Analysis of Variance Table

Response: Post_week8
           Df  Sum Sq Mean Sq F value  Pr(>F)  
Education   6   771.1 128.516  2.5519 0.01915 *
Residuals 514 25885.8  50.362                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The key values we can see are the F statistic, and the p-value that is associated with that. Given that our p-value is less than 0.05, we have evidence to reject the null hypothesis.

In regards to our research question we can say that:

There is a significant difference in mean BMI for at least some education levels.

We need to carefully check for differences between each group (correcting for multiple comparisons) to determine what is happening. To do this we can use a pairwise t-test with a Bonferroni correction. We use the Bonferroni adjustment because multiple pairwise comparisons increase the chance of finding a significant result just by chance. The adjustment makes the significance threshold more conservative, helping to control the overall Type I error rate (i.e. false positives) so that any detected differences are more likely to be real.

pairwise.t.test(Post_week8,Education,p.adj='bonf')

    Pairwise comparisons using t tests with pooled SD 

data:  Post_week8 and Education 

                                Bachelor degree Certificate (trade or business)
Certificate (trade or business) 0.254           -                              
Diploma or Associate degree     1.000           1.000                          
High school (up to 4 years)     0.581           1.000                          
High school (up to 6 years)     1.000           0.449                          
Posgraduate degree              1.000           0.066                          
Primary school                  1.000           1.000                          
                                Diploma or Associate degree
Certificate (trade or business) -                          
Diploma or Associate degree     -                          
High school (up to 4 years)     1.000                      
High school (up to 6 years)     1.000                      
Posgraduate degree              0.922                      
Primary school                  1.000                      
                                High school (up to 4 years)
Certificate (trade or business) -                          
Diploma or Associate degree     -                          
High school (up to 4 years)     -                          
High school (up to 6 years)     1.000                      
Posgraduate degree              0.147                      
Primary school                  1.000                      
                                High school (up to 6 years) Posgraduate degree
Certificate (trade or business) -                           -                 
Diploma or Associate degree     -                           -                 
High school (up to 4 years)     -                           -                 
High school (up to 6 years)     -                           -                 
Posgraduate degree              1.000                       -                 
Primary school                  1.000                       1.000             

P value adjustment method: bonferroni 

Since there are seven education groups, the pairwise t-tests compare the mean post-week-8 BMI between each pair of education levels. This means we can look at differences such as Bachelor degree vs Certificate (trade or business), Bachelor degree vs Diploma or Associate degree, or High school (up to 4 years) vs Postgraduate degree, and so on. The table shows the p-values for each of these comparisons, adjusted using the Bonferroni method.

The ANOVA showed a significant overall difference in post week 8 BMI between education groups (p = 0.019), meaning that at least one group had a different mean BMI compared to the others. However, when pairwise t tests were run with a Bonferroni adjustment, none of the individual group comparisons were statistically significant. This suggests that while there may be small differences in BMI across education levels overall, these differences are not strong enough between specific groups to remain significant once the stricter adjustment for multiple comparisons is applied.

Finally, lets calculate some confidence intervals for each group. If two groups have confidence intervals that do not overlap, it suggests their mean BMI values are likely different. If the intervals overlap, we can’t say there is a clear difference between those groups.

#Calculate CIs for each group
mod.aov.ci <- lm(Post_week8~Education-1, data = living_well_aov)
ci<-confint(mod.aov.ci)
ci
                                            2.5 %   97.5 %
EducationBachelor degree                 26.50376 29.39517
EducationCertificate (trade or business) 29.11016 32.90466
EducationDiploma or Associate degree     27.80344 31.40322
EducationHigh school (up to 4 years)     28.76072 32.11754
EducationHigh school (up to 6 years)     26.78538 29.64620
EducationPosgraduate degree              24.81475 28.83942
EducationPrimary school                  27.75092 30.51182
#Calculate means for each group
mean_lw<-aggregate(Post_week8 ~ Education, data = living_well_aov, FUN = mean)
#Add the means and CIs together in a dataframe to plot
values<-cbind(mean_lw,ci)
#Rename the column headings and row headings
colnames(values)<-c("group","mean", "lower","upper")
rownames(values)<-c(1, 2, 3, 4, 5, 6, 7)
values
                            group     mean    lower    upper
1                 Bachelor degree 27.94946 26.50376 29.39517
2 Certificate (trade or business) 31.00741 29.11016 32.90466
3     Diploma or Associate degree 29.60333 27.80344 31.40322
4     High school (up to 4 years) 30.43913 28.76072 32.11754
5     High school (up to 6 years) 28.21579 26.78538 29.64620
6              Posgraduate degree 26.82708 24.81475 28.83942
7                  Primary school 29.13137 27.75092 30.51182
#create errorbar plot
ggplot(data=values, aes(x=group, y=mean))+
  geom_errorbar(aes(ymin=lower,
                    ymax=upper), 
                linewidth=1.5, 
                col=c("#f94144",
                      "#f3722c",
                      "#f8961e",
                      "#f9c74f",
                      "#90be6d",
                      "#43aa8b",
                      "#577590"))+
  geom_point(size=2, 
             col="#0F151A")+
  theme_bw()+
  labs(y="BMI (mean+CIs)") +
  theme(axis.text.x = element_text(angle = 90, 
                                   vjust = 0.5, 
                                   hjust = 1))

Because these intervals overlap, it suggests that the mean post-week-8 BMI values are quite similar across education groups. There isn’t clear separation between any of the groups, which means we can’t say any one education group’s mean BMI is significantly different from the others based on these confidence intervals.

Therefore, the ANOVA indicated a significant overall difference in post week 8 BMI across education levels (p = 0.019), so the null hypothesis of equal means was rejected. However, post hoc tests and confidence intervals showed that no specific group comparisons were significant, suggesting that while some variation exists across education levels, the differences are small.